Search CORE

124 research outputs found

Combining Processor Virtualization and Split Compilation for Heterogeneous Multicore Embedded Systems

Author: Rohou Erven
Publication venue: Dagstuhl Seminar Proceedings. 08441 - Emerging Uses and Paradigms for Dynamic Binary Translation
Publication date: 01/10/2008
Field of study

Complex embedded systems have always been heterogeneous multicore systems. Because of the tight constraints on power, performance and cost, this situation is not likely to change any time soon. As a result, the software environments required to program those systems have become very complex too. We propose to apply instruction set virtualization and just-in-time compilation techniques to program heterogeneous multicore embedded systems, with several additional requirements: * the environment must be able to compile legacy C/C++ code to a target independent intermediate representation; * the just-in-time (JIT) compiler must generate high performance code; * the technology must be able to program the whole system, not just the host processor. Advantages that derive from such an environment include, among others, much simpler software engineering, reduced maintenance costs, reduced legacy code problems... It also goes beyond mere binary compatibility by providing a better exploitation of the hardware platform. We also propose to combine processor virtualization with split compilation to improve the performance of the JIT compiler. Taking advantage of the two-step compilation process, we want to make it possible to run very aggressive optimizations online, even on a very constraint system

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Dagstuhl Research Online Publication Server

HAL-Rennes 1

Infrastructures et stratégies de compilation pour parallélisme à grain fin

Author: Rohou Erven
Publication venue: HAL CCSD
Publication date: 17/11/1998
Field of study

The increasing complexity of processors has led to the development of a large number of code transformations to adapt computations to the hardware architecture. The major difficulty faced by a compiler is to determine the sequence of transformations that will provide the best performance. This sequence depends on the application and the processor considered. The deep interaction between the various code transformations does not allow to find a static solution.We propose an iterative approach to compilation to solve this problem: each optimization module can revisit the decisions made by another module. These modules can communicate information about the properties of the code they have produced. This approach requires a complete redesign of the structure of current compilers.The realization was only made possible thanks to the software infrastructures that we developed: Salto and SEA. Thanks to these environments, we were able to quickly develop prototypes of compilation strategies.We also show that analysis and optimization should not be limited to the local behavior of a code fragment. On the contrary, the global behavior of the application must be considered, especially for embedded systems.La complexité croissante des processeurs a conduit au développement d'un grand nombre de transformations de code pour adapter l'organisation des calculs à l'architecture matérielle. La difficulté majeure à laquelle est confronté un compilateur consiste à déterminer la séquence de transformations qui va fournir la meilleure performance. Cette séquence dépend de l'application et du processeur considérés. L'interaction profonde entre les diverses transformations de code ne permet pas de trouver une solution statique.Nous proposons une approche itérative de la compilation pour résoudre ce problème : chaque module d'optimisation peut remettre en cause les décisions prises par un autre module. Ces modules peuvent se communiquer des informations sur les propriétés du code qu'ils ont produit. Cette approche nécessite une refonte complète de la structure des compilateurs actuels.La réalisation n'a été rendue possible que grâce aux infrastructures logicielles que nous avons développées : Salto et SEA. Grâce à ces environnements nous avons pu développer rapidement des prototypes de stratégies de compilation.Nous montrons aussi que l'analyse et l'optimisation ne doivent pas se contenter d'un comportement local à un fragment de code. Au contraire, le comportement global de l'application doit être considéré, en particulier pour les systèmes enfouis

INRIA a CCSD electronic archive server

Infrastructures and Compilation Strategies for the Performance of Computing Systems

Author: Rohou Erven
Publication venue: HAL CCSD
Publication date: 02/11/2015
Field of study

This document presents our main contributions to the field of compilation, and more generally to the quest of performance ofcomputing systems.It is structured by type of execution environment, from static compilation (execution of native code), to JIT compilation, and purelydynamic optimization. We also consider interpreters. In each chapter, we give a focus on the most relevant contributions.Chapter 2 describes our work about static compilation. It covers a long time frame (from PhD work 1995--1998 to recent work on real-timesystems and worst-case execution times at Inria in 2015) and various positions, both in academia and in the industry.My research on JIT compilers started in the mid-2000s at STMicroelectronics, and is still ongoing. Chapter 3 covers the results we obtained on various aspects of JIT compilers: split-compilation, interaction with real-time systems, and obfuscation.Chapter 4 reports on dynamic binary optimization, a research effort started more recently, in 2012. This considers the optimization of a native binary (without source code), while it runs. It incurs significant challenges but also opportunities.Interpreters represent an alternative way to execute code. Instead of native code generation, an interpreter executes an infinite loop thatcontinuously reads a instruction, decodes it and executes its semantics. Interpreters are much easier to develop than compilers,they are also much more portable, often requiring a simple recompilation. The price to pay is the reduced performance. Chapter 5presents some of our work related to interpreters.All this research often required significant software infrastructures for validation, from early prototypes to robust quasi products, andfrom open-source to proprietary. We detail them in Chapter 6.The last chapter concludes and gives some perspectives

HAL-CentraleSupelec

Thèses en Ligne

INRIA a CCSD electronic archive server

HAL-Rennes 1

So Far So Good: Self-Adaptive Dynamic Checkpointing for Intermittent Computation based on Self-Modifying Code

Author: Rohou Erven
Yarahmadi Bahram
Publication venue: HAL CCSD
Publication date: 01/11/2021
Field of study

International audienceRecently, different software and hardware based checkpointing strategies have been proposed to ensure forward progress toward execution for energy harvesting IoT devices. In this work, inspired by the idea used in dynamic compilers, we propose SFSG: a dynamic strategy, which shifts checkpoint placement and specialization to the runtime and takes decisions based on the past power failures and execution paths taken before each power failure. The goal of SFSG is to provide forward progress and to avoid facing non-termination without using hardware features or programmer intervention. We evaluate SFSG on a TI MSP430 device, with different types of benchmarks as well as different uninterrupted intervals, and we evaluate it in terms of the number of checkpoints and its runtime overhead

INRIA a CCSD electronic archive server

The Pitfalls of Benchmarking with Applications

Author: Lafage Thierry
Rohou Erven
Publication venue: HAL CCSD
Publication date: 20/06/2010
Field of study

International audienceApplication benchmarking is a widely trusted method of performance evaluation. Compiler developers rely on them to assess the correctness and performance of their optimizations; computer vendors use them to compare their respective machines; processor architects run them to tune innovative features, and — to a lesser extent — to validate their correctness. Benchmarks must reflect actual workloads of interest, and return a synthetic measure of “performance”. Often, benchmarks are simply a collection of real-world applications run as black boxes. We identify a number of pitfalls that derive from using applications as benchmarks, and we illustrate them with a popular, freely available, benchmark suite. In particular, we advocate the fact that correctness should be defined by an expert of the application domain, and the test should be integrated in the benchmark

INRIA a CCSD electronic archive server

Vers la reconfiguration adaptative de GPU pour chaque application

Author: Collange Caroline
Kouyoumdjian Alexandre
Rohou Erven
Publication venue: HAL CCSD
Publication date: 25/06/2019
Field of study

International audienc

INRIA a CCSD electronic archive server

Predictable Binary Code Cache: A First Step Towards Reconciling Predictability and Just-In-Time Compilation

Author: Bouakaz Adnan
Puaut Isabelle
Rohou Erven
Publication venue: HAL CCSD
Publication date: 11/04/2011
Field of study

International audienceVirtualization and just-in-time (JIT) compilation have become important paradigms in computer science to address application portability issues without deteriorating average-case performance. Unfortunately, JIT compilation raises predictability issues, which currently hinder its dissemination in real-time applications. Our work aims at reconciling the two domains, i.e. taking advantage of the portability and performance provided by JIT compilation, while providing predictability guarantees. As a first step towards this ambitious goal, we study two structures of code caches and demonstrate their predictability. On the one hand, the studied binary code caches avoid too frequent function recompilations, providing good average-case performance. On the other hand, and more importantly for the system determinism, we show that the behavior of the code cache is predictable: a safe upper bound of the number of function recompilations can be computed, enabling the verification of timing constraints. Experimental results show that fixing function addresses in the binary cache ahead of time results in tighter Worst Case Execution Times (WCETs) than organizing the binary code cache in fixed-size blocks replaced using a Least Recently Used (LRU) policy

INRIA a CCSD electronic archive server

Compile-Time Function Memoization

Author: Rohou Erven
Seznec André
Suresh Arjun
Publication venue: HAL CCSD
Publication date: 05/02/2017
Field of study

International audienceMemoization is the technique of saving the results of computations so that future executions can be omitted when the same inputs repeat. Recent work showed that memoization can be applied to dynamically linked pure functions using a load-time technique and results were encouraging for the demonstrated transcendental functions. A restriction of the proposed framework was that memo-ization was restricted only to dynamically linked functions and the functions must be determined beforehand. In this work, we propose function memoization using a compile-time technique thus extending the scope of memoization to user defined functions as well as making it transparently applicable to any dynamically linked functions. Our compile-time technique allows static linking of memo-ization code and this increases the benefit due to memoization by leveraging the inlining capability for the memoization wrapper. Our compile-time analysis can also handle functions with pointer parameters , and we handle constants more efficiently. Instruction set support can also be considered, and we propose associated hardware leading to additional performance gain

INRIA a CCSD electronic archive server

HAL-Rennes 1

Aggressive Memory Speculation in HW/SW Co-Designed Machines

Author: Derrien Steven
Rohou Erven
Rokicki Simon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/03/2019
Field of study

International audienceSingle-ISA heterogeneous systems (such as ARM big.LITTLE) are an attractive solution for embedded platforms as they expose performance/energy trade-offs directly to the operating system. Recent works have demonstrated the ability to increase their efficiency by using VLIW cores, supported through Dynamic Binary Translation (DBT) to maintain the illusion of a single-ISA system. However, VLIW cores cannot rival with Outof- Order (OoO) cores when it comes to performance, mainly because they do not use speculative execution. In this work, we study how it is possible to use memory dependency speculation during the DBT process. Our approach enables fine-grained speculation optimizations thanks to a combination of hardware and software. Our results show that our approach leads to a geo-mean speed-up of 10% at the price of a 7% area overhead

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1